Quiz on GFS
Test your understanding of concepts related to the design of Google File System via a quiz.
Question 5
How can a client find the right chunk in the presence of padding?
Because of padding and record duplicates, the application-level file size would be less or equal to the bytes the system has occupied to store that file. To read a particular data byte, the clients have to find the chunk number that contains that data byte. Since there are paddings and record duplicates, we may not land on the right chunk just by dividing the data byte offset with chunk size.
The padding and record duplicates can be identified using checksums and special markings that are stored with the data.
If a client starts sequential reading from a random offset, or if it’s a small random read, then the client needs to know the right chunk/s containing the requested data byte. The client doesn’t know how much padding or recorded duplicates are present in the file chunks; it can’t just find the right chunk in the first place, nor can it start its search for the right chunk starting from a random estimated chunk. The client has to iterate over all the chunks from the start until it finds the chunk with the requested data byte, which is very costly if we do it on each read. It is on the applications how they tackle these problems on their side.
An application might put hints while writing the records that help later readers approximate the application-level byte index.
We encourage you to brainstorm other ways to find out the application-level byte index efficiently when the underlying file might be mutating concurrently.
5 of 5
Evaluation of GFS
Introduction to Colossus